Carbon Monitor Cities near-real-time daily estimates of CO2 emissions from 1500 cities worldwide

Building on near-real-time and spatially explicit estimates of daily carbon dioxide (CO2) emissions, here we present and analyze a new city-level dataset of fossil fuel and cement emissions, Carbon Monitor Cities, which provides daily estimates of emissions from January 2019 through December 2021 for 1500 cities in 46 countries, and disaggregates five sectors: power generation, residential (buildings), industry, ground transportation, and aviation. The goal of this dataset is to improve the timeliness and temporal resolution of city-level emission inventories and includes estimates for both functional urban areas and city administrative areas that are consistent with global and regional totals. Comparisons with other datasets (i.e. CEADs, MEIC, Vulcan, and CDP-ICLEI Track) were performed, and we estimate the overall annual uncertainty range to be ±21.7%. Carbon Monitor Cities is a near-real-time, city-level emission dataset that includes cities around the world, including the first estimates for many cities in low-income countries.


Background & Summary
More than 60% of global fossil-fuel CO 2 emissions are produced in cities 1,2 , and high-quality city-level emissions inventories are urgently needed to support international climate mitigation efforts [3][4][5] . For example, many cities have adopted goals of reaching net-zero emissions by 2030 or 2050, which require them to monitor and report emissions on a timely basis 6 . Unfortunately, a global, open, and harmonized dataset of city-level emission inventories is yet lacking 7,8 . Instead, most CO 2 emission inventories are conducted at the country level, as city-level fossil fuel consumption data are more difficult to acquire 9 . Furthermore, many inventories-including national inventories reported to the United Nations Framework Convention on Climate Change (UNFCCC) often lag reality by one years or more 10,11 . Thus, many city-level mitigation efforts are hampered by a lack of timely and high-quality emissions data with which to set benchmarks and monitor progress [12][13][14][15] .
City-level CO 2 emissions may refer to either the CO 2 emissions produced within the territory of a city or emissions related to all the goods and services consumed in a city, which often include substantial emissions produced outside the city boundary 8,16,17 . The in-boundary emissions are typically referred to as scope 1, emissions from imported electricity as scope 2, and all other trans-boundary emissions associated with other city activities are referred to as scope 3 17,18 . Three conventional approaches have been used to attribute CO 2 emissions to cities: purely geographic production-based accounting, community infrastructure-based accounting , which also relies on other spatial data like population density and road networks. Downscaling has been used to construct multiple city-level datasets that cover a large number of cities 17,21 , and we adopt a similar approach in this study.

Methods
Workflow. Carbon Monitor Cities is downscaled from the Carbon Monitor, which is a NRT national level emission dataset at a global scale 10 . Specifically, CM-Cities is produced following a four-stage workflow (Fig. 1). The first stage mainly involves the construction of Global Gridded Daily CO 2 Emission Datasets (GRACED) 40 , which are daily emission maps generated by spatializing Carbon Monitor daily emissions using the Global Carbon Grid (GID), the Emissions Database for Global Atmospheric Research (EDGAR) and TROPOspheric Monitoring Instrument (TROPOMI). GRACED covers seven sectors (power, industry, residential and commercial buildings, ground transportation, domestic aviation, international aviation, and international shipping) and provides NRT emission maps for fossil fuel combustion and cement production with a global spatial resolution of 0.1° by 0.1° and a temporal resolution of one day. GRACED is an intermediate gridded dataset between the Carbon Monitor and CM-Cities, and the methods for generating this gridded dataset is described in a later section.
In the second stage, we disaggregated the gridded daily emissions into cities based on two types of city areas: Global Administrative Areas (GADM) and Functional Urban Areas (FUA) to address the definition differences of "a city" in different countries. The FUA is defined by the Organisation for Economic Co-operation and Development (OECD) and the European Union as the high-density urban centres plus their surrounding commuting zones 41 . For OECD countries, we used the OECD FUA, which provides higher quality FUA for OECD countries (https://www.oecd.org/regional/regional-statistics/functional-urban-areas.htm 42 ). For other countries, the Global Human Settlement FUA is used (https://ghsl.jrc.ec.europa.eu/ghs_fua.php 43 ). The GADM level-2 administrative areas are used for prefecture-level cities in China and counties in the United States. The details of the features and the usage of FUA and GADM datasets are described in later sections. The spatial downscaling/disaggregation is performed by first converting the FUA and GADM shapefiles into raster datasets with a unique ID assigned to each city. Then the raster city area maps are used as masks to extract the matching grid cells in the GRACED emission maps. We then aggregate emission values for grid cells that correspond to the same city mask to yield the total sectoral emission value for a given city.
In the third stage, we use city-level data to correct for the residential and ground transport sectors to address the bias in raw city-level inventories from the second stage. We use city-specific TomTom daily transport congestion data and daily heating degree days (HDD) for the corrections.
The fourth stage involves error correction and data validation. We first identify and remove outliers (which are mostly errors introduced by previous processing steps and/or from the source data) using statistical approaches. We then collected city-level inventories from other datasets (mostly annual data) and compared them to our results for validation. The detailed procedures for these processes are described in later sections.
CM-Cities currently includes city-level emission inventories from 01/01/2019 to 31/12/2021 for five main sectors: 1. power generation, 2. residential and commercial buildings, 3. industrial production, 4. ground transportation, and 5. aviation. These five sectors combined account for over 70% of fossil fuel CO 2 emissions from a city 33 . Custom code used in this work is described in the Code Availability section.
Coverage. CM-Cities covers 1500 cities in 46 counties (Fig. 2). Most of the cities are clustered in Europe, Asia, North and South America. Major cities in Oceania and Africa are also included. Figure 2 also shows comparisons between the FUA and the GADM for Los Angeles (US), Hangzhou (China), and Melbourne (Australia). The FUA typically covers a larger area than the administrative area, but for cities in some countries, such as China, the FUA is typically smaller than the administrative city area. The use of both area definitions facilitates dataset comparisons, which is highlighted for cities in China and the United States. These two different spatial scopes also provide critical information for differentiating administrative emissions versus community-wide emissions.
Near-real-time daily emissions by sector. CM-Cities is downscaled from the GRACED dataset, in which spatial distribution and daily variations of emissions are combined. This section describes the methods for estimating NRT daily emissions from a temporal perspective, and the next section describes the spatial gridding procedure. The estimation of daily emission variations follows the Carbon Monitor national dataset 10,36,44 , which provides daily fossil fuel CO 2 emissions since January 1st, 2019 on the global and national levels, with detailed estimates in 7 main sectors, i.e., power, industry, ground transport, residential (including commercial), domestic aviation, international aviation, and international shipping. Emissions from international bunkers (including the international aviation sector and international shipping sector) are only accounted for at the global level and usually excluded from the national territorial emissions according to the IPCC guidelines. Therefore, CM-Cites considers the other 5 sectors, i.e., power, industry, ground transport, residential, and domestic aviation.  Table 2. Data sources for industrial production.
Power sector. Daily power generation data are acquired from multiple open data sources depending on the country (Table 1), which provides live power generation data with a daily or hourly resolution, and accounts for more than 70% of the total CO 2 emissions in the power sector 10 . The emission factors are estimated using EDGAR's electricity emissions, divided by our collection of coal-fired electricity data in various countries. The daily emissions are estimated as: where AD is the power generation. For emissions from other countries (countries not listed in Table 1), we assumed a linear relationship between daily global emission and daily total emissions from these countries, and then adjusted the emissions for countries that adopted lock-down measures during the COVID-19 following the method used by the Carbon Monitor national dataset 10 .
Industry sector. For the industry sector, the daily emissions are calculated from the monthly industrial production index and the daily power generation data. Monthly industrial production data are acquired from several datasets ( Table 2). The monthly CO 2 emissions estimated from the Industrial Production Index (IPI) are then disaggregated into a daily scale using daily power generation data. This approach is based on two assumptions: 1. A linear relationship exists between daily industrial production and industrial fossil fuel use. 2. A linear relationship exists between daily industry activity and daily electricity production 10 . The monthly and daily industry emissions are estimated following: where Emis ind, monthly, currentyear, c is the monthly industry emissions for country c in current year, Emis ind, yearly, 2019, c is the yearly industry emissions for country c in 2019 (year of the latest update of baseline emissions), IPI is the corresponding Industrial Production Index. Emis ind, daily and Emis ind, monthly are the daily and monthly industry emissions, respectively. Elec daily and Elec monthly are the daily and monthly electricity production, respectively. For countries not listed in Table 2, the industry sector emissions are estimated in the same way as for the power sector.
Ground transport sector. Daily emissions from ground transportation are estimated using TomTom live congestion index and EDGAR road transportation emissions. The TomTom traffic congestion level represents the extra time spent on a trip in congested conditions, as a percentage, compared to uncongested conditions. TomTom congestion level data were obtained for more than 400 cities around the world at a temporal resolution of one hour (https://www.tomtom.com/traffic-index/). This approach permits the estimation of NRT emissions from ground transportation with a temporal resolution up to one hour, and TomTom grants users permission for non-commercial usage. The TomTom live congestion level data was proven to be highly accurate for most cities, and Carbon Monitor has successfully adopted this approach 10 . Note that a zero-congestion level means the traffic is fluid or "normal" but does not mean there are no vehicles and zero emissions. The lower threshold of emissions when the congestion level is zero was estimated using real-time data from an average of 60 roads in the city of Paris. TomTom data accurately depicts the traffic volume using a sigmoid function-based regression (Eq. 4), and Fig. 3 is a comparison between the actual and TomTom estimated hourly car counts on the measured roads in Paris. The estimated traffic volume is then used to allocate the EDGAR on-road emissions to each day (Eq. 5). www.nature.com/scientificdata www.nature.com/scientificdata/ where Q d is the mean vehicle number per hour in day d, X is the daily mean TomTom congestion level data, and a, β, γ, λ are regression parameters, Emis trans, c, d is the ground transport emissions in day d, Emis onroad is the annual EDGAR road transportation emissions, n is the number of days in a year. For cities not covered by TomTom, we assumed that the emission changes follow the mean changes of other cities in the country. If no city in the country has TomTom data, then the relative emission changes are assumed to follow the same pattern of the total emissions from all TomTom-covered countries.
Residential sector. Carbon Monitor uses the fluctuation of air temperature to capture the daily variations in the energy consumption of residential and commercial buildings. The assumption associated with this method is that the heating demand, which is the largest contribution to the daily variability in emissions for this sector, is strongly governed by air temperature 45 , which determines the HDD (cooling in summer mainly consumes electricity that is covered in the power sector). This approach uses population-weighted HDD for different geographic locations for each day based on the ERA-5 reanalysis of air temperature 46 and also accounts for temperature-independent cooking emissions following EDGAR. The EDGAR residential emissions are then downscaled to daily values based on daily variations in population-weighted heating degree days.

Emis
Emis R HDD where Emis res, c, d and Emis res, c, m are the residential emissions for country c in day d and month m respectively, R heating, c, m is the percentage of residential emissions from heating demand in country c in month m, HDD c,d is the population-weighted heating degree day for country c in day d, N m is the number of days in month m, R pop, g is the ratio of the population in grid g to the total national population, which is acquired from the Gridded Population of the World, version 4 47 , He is a Heaviside step function that converts any negative values to zero, T g, d is the average air temperature in Celsius for grid g in day d at 2 meters derived from ERA5 46 , and 18 is a HDD reference temperature of 18 °C.
Aviation sector. Emissions in the aviation sector are computed from individual commercial flights data from the Flightradar24 database (https://www.flightradar24.com). This sector covers domestic flights, and all airports around a city were selected even if they are not part of the FUA, but some airports are not covered by the GADM, since we follow a territorial approach for emission allocation, if a city does not have an airport but emissions are present within the FUA boundary, (e.g., the city of Dongguan does not have its own airport but has two nearby airports in Guangzhou and Shenzhen). In this case, we have attributed the daily patterns of the airport that is closest to the city. The daily CO 2 emissions were estimated as the product of distance flown and a constant emission factor (EF avi ).
where DF is distance flown, which is computed using great circle distance between the take-off, cruising, descent, and landing points for each flight and are cumulated over all flights. The emission factor per kilometer flown is assumed to be a constant for the mix of all aircraft from an airport (including regional, narrowbody passenger, widebody passenger, and freight operations) as the share of flight types has not significantly changed since 2019.
Gridded daily CO 2 emissions. Carbon Monitor Cities disaggregates the Carbon Monitor national emissions to cities using the GRACED dataset developed by the Carbon Monitor team 40 , which consists of emission maps generated by spatializing and gridding the daily national emission inventories from Carbon Monitor into grid cells. This was achieved by estimating spatial distribution proxies from satellite data and existing gridded products while maintaining consistency between bottom-up accounting results and the spatial sum of the  www.nature.com/scientificdata www.nature.com/scientificdata/ gridded results. Three datasets were used in producing GRACED: 1. The Global Carbon Grid (GID), which provides global CO 2 emissions data from major industry and power plant point sources with a resolution of 0.1° in 2019, 2. The Emissions Database for Global Atmospheric Research (EDGAR), which provides sectoral emissions as specified by the IPCC guidelines. 3. The NO 2 thermal chemical vapor deposition retrieval product from the TROPOspheric Monitoring Instrument (TROPOMI) onboard the Sentinel-5 Precursor satellite. Given that GID has higher data quality in fine-grained spatial scales and point sources of industries and power plants, the GID-based point sources and the EDGAR emission maps were combined for constructing GRACED (Eq. 9). While the spatial emission patterns derived from GID and EDGAR (with latest updates in 2019) cannot accurately reflect the situation in 2020 and 2021, the NRT TROPOMI NO 2 retrievals were used as a proxy for CO 2 to capture the daily variability in CO 2 emission following GRACED 40 . After several data processing steps, such as rolling-average and thresholding, the NO 2 data can reasonably indicate the spatial distribution of CO 2 sources 48 . Table 3 lists the gridded data used for producing GRACED. For the aviation sector, EDGAR's monthly data are used for spatial distribution (Eq. 10). Thus, the gridded emissions EmiGrid g, d, s for grid g, date d and sector s were estimated as: , , where CM c, d, s represents the value of Carbon Monitor national emission for country c, day d and sector s. s includes the power, industry, residential, and ground transport sectors. avi is the aviation sector. GID g, s is the value of GID gridded CO 2 emissions for grid g and sector s. n is the total number of grids within this country and j is the total number of month. EDGAR g, m, s represents the EDGAR gridded CO 2 emissions for grid g, sector s, and month m which date d belongs to.
City-level spatial disaggregation. The spatial disaggregation is performed by first converting the city area shapefiles (FUA or GADM) into raster datasets with a unique ID assigned to each city. Then the raster city area maps are used as masks to extract the matching grid cells in GRACED emission maps. We then aggregate emission values for grid cells that correspond to the same city mask to yield the total sectoral emission value for a given city. For the aviation sector, emissions from all planes within the city's territory are included. The international shipping sector is not included in this dataset because most of the emissions from this sector occurred in the open ocean that cannot be allocated to specific cities. The jurisdiction issue also applies to the aviation sector, but we keep the territorial-based allocation approach in the dataset for completeness. We use both the administrative areas and FUA because boundary definition has always been a problem in city-level inventory completion 17 , as the administrative city areas in most countries do not reflect emissions from the larger commuting zones of a city, which may constitute a large part of the emissions, meanwhile, FUA represents the most viable spatial dataset for covering the more complete urban areas. In addition, FUA is clearly-defined and produced using a consistent method for cities worldwide, while the definition of administrative city areas may vary significantly across different countries. Therefore, the use of both spatial scopes represents a potential solution to differentiate administrative emissions versus community-wide emissions and makes inter-dataset comparisons easier as demonstrated in the validation section.
City-level corrections. The disaggregation from EDGAR spatial distributions is insufficient especially for the residential and ground transport sectors, because EDGAR uses a disaggregation of national sectoral totals per population for residential, and per road network for ground transport, which introduces bias to cities. Therefore, we correct these two sectors at individual city level. The ground transport sector emission is corrected using city-specific TomTom data (by applying Eqs. 4,5 at city scale) for 416 cities worldwide that have their own NRT TomTom indices (list of these cities can be found in the documentation on Carbon Monitor website), which  www.nature.com/scientificdata www.nature.com/scientificdata/ represent more accurate ground transport emission estimates for these cities. For cities that do not have their own TomTom data, we spatially disaggregate the national mean estimates following Eqs. 4, 5, 9. The residential sector is corrected using city-level HDD to overcome the bias of downscaling from national inventory. Specifically, we first calculate the daily mean HDD for each city from the population-weighted HDD grid (Eq. 7), and then use it as the baseline to compute a correction factor for each city by comparing it with the mean national HDD to update the emissions for the residential sector: where Emis res, i and Emis0 res, i represent the corrected and uncorrected residential emissions, respectively for city i, HDD c is the mean daily HDD for the country, and HDD i is the mean daily HDD for city i.
Outlier correction. Outliers exist in the data mainly due to errors in the source datasets, such as mistakes in unit conversions or data entry, etc. To correct these outliers, we apply a statistical method based on intrinsic properties of the distribution of the emissions in the database. This allows more accurate identification of outliers  Table 5. Summary of city emission datasets (Δ is uncertainty) and comparison statistics including coefficient of determination (R 2 ), mean relative difference (Rd), and sample size (n) when compared with CM-Cities.
www.nature.com/scientificdata www.nature.com/scientificdata/ that are likely to be the results of incorrect data entry. Similar statistical approaches have been successfully applied to correct for outliers in emission datasets 25 . The outlier identification method is based on standard deviation (STD). Specifically, we consider an emission value as an outlier if the differences between the current value and its daily neighbours are both greater than 3 times the yearly STD for that sector (Eq. 12). This threshold is determined by experimenting with data with known error and data with periodical high variation (e.g., weekday versus weekends for the ground transport emissions). These experiments determined the lower and upper bounds of the threshold such that it correctly identifies outliers and keeps the inherent variance within the data. Limitations and future work. This dataset focuses on improving the timeliness, temporal resolution, and coverage of city-level inventories for studying NRT emission dynamics and also providing emission estimates for many cities in low-income regions. This dataset does not account for emissions related to land use, land use change, waste, and forestry, therefore, some emissions caused by long-term urban expansion are not captured. This dataset is constructed based on daily activity and models that can cover a majority rather than the entire daily emission-related activities due to data availability. Therefore, we acknowledge that a small portion of daily variations in city emissions are not reflected in this dataset. This dataset is derived from the gridded Carbon Monitor which is based on downscaled national inventories, combined with point sources and spatial distributions from GID and EDGAR, therefore, one limitation is the lack of using city-specific bottom-up activity data except for the ground transport sector, which may introduce additional uncertainties. We also noted that some input data may contain inherent errors and missing values (other than the above-mentioned outliers), especially for cities in less developed nations, we do not intend to fix this kind of errors in the source data without enough background information of the specific city, but we consider our results represent a meaningful first-order estimate for many of these cities that are lacking any emission inventories. Estimating NRT daily emissions for cities is a relatively new research direction and requires ongoing efforts to calibrate and update the workflow to improve data quality in the future. Further validation of the data is a crucial next step, and we plan to conduct more data validation and quality improvements in the future through www.nature.com/scientificdata www.nature.com/scientificdata/ multiple follow-up works. From a bottom-up perspective, we are collecting more city-level fossil fuel consumption data to better constrain the annual or monthly total emissions. From a top-down perspective, we plan to compare our results with field observations and satellite retrieval data. As proposed by previous study 1 , remotely sensed urban atmospheric measurements can help us estimate and predict CO 2 emission fluxes, and we plan to leverage these research outcomes to improve this city-level emissions dataset. This will require a lot more effort to collect and harmonize inventories with observations 49 , but some progress is being made, for example, several observation systems are being designed to monitor megacity CO 2 domes and surface-based observations of atmospheric CO 2 are commercially available for some major cities 1 . Future work will compare our results with observations from a set of surface, airborne, and satellite sensors. which would provide a foundation for more accurate validation of bottom-up city emission inventories.

Data Records
CM-Cities provides scope-1 NRT city-level emission inventories from 01/01/2019 to 31/12/2021 for 1500 cities in 46 counties. All data have gone through a validation process, in which we estimated the uncertainties and corrected errors. The attributes of the final dataset are listed in Table 4, and the emission data are organized into spreadsheets. The definitions for sectors are consistent with the Carbon Monitor national inventories. Brief descriptions of the methods, sectors, coverage and uncertainty are also provided in Table 5. Latest updates for selected cities and related information are available for view and download on our website https://cities.carbonmonitor.org. At the time of writing this article, this dataset has been updated to December 31, 2021 and the full dataset can be downloaded at Figshare 50 . Future updates will also be available on our website. www.nature.com/scientificdata www.nature.com/scientificdata/ • The file that contains functional urban area results for all cities (carbon-monitor-cities-all-cities-FUA.csv) has 8,114,886 data records (some cities have missing values). Separate data files are also provided for each of the 46 countries (carbon-monitor-cities-"CountryName".csv). • The file that contains all administrative area results for Chinese cities (carbon-monitor-cities-China-GADM-prefecture.csv) has 1,885,120 data records, including 344 prefecture-level cities, and each city has 5480 data records. Data examples. Daily CO 2 emission variations from a city reveal its geographic and socio-economic characteristics. Figure 4 shows the sectoral breakdown of daily CO 2 emissions for some major cities in different regions of the world, including East Asia, Middle East, Southeast Asia, West Europe, East Europe, Oceania, South America, North America, and Africa. As an example of the geographic influence on the emissions, cities in the Southern Hemisphere, such as Sydney in Australia, and Cape Town in Africa, exhibit higher emissions in the residential sector during the southern winter (northern summer) due to the increase in heating demand. Daily emissions also reveal certain events such as holidays and the COVID-19 outbreak. The emissions from the power sector show a surge in summer for many cities, which is likely due to the increased power consumption for cooling. As an example, we show the impact of COVID-19 on city emissions for Greater New York in the U.S. and Ahmedabad in India (Fig. 5), note the significant drop in emissions for the ground transport sector in spring 2020 (as indicated by the dashed lines) during the lockdown period and during the 2021 second wave of COVID-19 pandemic in India. Figure 6 depicts the total daily emissions for the year 2020 versus the year 2021 for selected cities. By comparing the emissions in spring 2020 and spring 2021, we noted that for these cities, emissions rebound from the lower levels caused by the COVID-19 pandemic. Subplots for the city of Moscow are presented here as an example to show the seasonal and weekly patterns of the city-level emissions for different sectors, which highlights the advantage of the low latency and demonstrates that this high temporal resolution dataset can be very useful for investigating weekly and seasonal variations in city emissions.

Technical Validation
The quality of this dataset is evaluated by comparing it against existing datasets (Table 5). We also performed uncertainty analysis for our data and for each sector based on a synthesis analysis of input data uncertainties and the methodology used. Significant outliers were identified and corrected as shown in the examples of Fig. 7. The outlier occurrence rate for this dataset is 0.012%.
Validation against other datasets. Multiple datasets are used to validate our results, including 1. City inventories from the CDP-ICLEI Track, 2. Vulcan dataset for US counties, 3. CEADs and MEIC dataset for China, and 4. individual reports released by city governments. Note that only scope-1 emissions are compared. For cities in China, we validated our dataset by comparing it with the CEADs and MEIC datasets. CEADs provides annual provincial emission inventories for China for 2019, and we validated the data for each province by summing up emissions from all prefecture-level cities in each province (for China, the GADM level-2 is exactly the area of prefecture-level cities, and the sum of all prefecture-level cities within a province equals the total area of that province). Figure 8 depicts the comparison results for all the Chinese provinces including municipalities and most autonomous regions. The result indicates a good agreement between the CEADs and CM-Cities, with less than 10% difference in annual emissions for most of the provinces. Statistics (Table 5) indicate that the coefficient of determination (R 2 ) values between CEADs and CM-Cities are 0.96, 0.76, and 0.92 for total, power, and industry sectors, respectively, and the corresponding mean relative difference (Rd) are 11%, 30%, and 28%. R 2 values between MEIC and CM-Cities are 0.93 and 0.62 for the power sector and ground transport sector, respectively, and the corresponding Rd are 21% and 31%. Other sectors were not compared due to the large differences in sector definition and coverage. The mean relative differences are all within the uncertainty ranges, which indicates a relatively high accuracy for Chinese cities.  www.nature.com/scientificdata www.nature.com/scientificdata/ datasets and compared city inventories from all available data sources regardless of the time of accounting. Note that GADM level-2 in the United States represents exactly the area of counties, so our GADM results were used www.nature.com/scientificdata www.nature.com/scientificdata/ for comparison with Vulcan county-level inventories. Figure 9 show examples of the annual total emission comparisons between CM-Cities and these datasets.
Comparison between Vulcan (2015) and CM-Cities (2019) covers top 50 counties with the highest emissions in the United States. We used the coefficient of determination (R 2 ) and mean relative difference (Rd) to evaluate the comparison results. R 2 values are 0.82, 0.60, 0.58, 0.82, 0.90, and 0.69 for total, power, industry, residential (and commercial buildings), ground transport, and aviation sectors, respectively, and the corresponding Rd values are 26%, 114%, 67%, 35%, 41%, and 58%, respectively ( Table 5). The differences are mainly due to 1. the difference in the year of accounting, as the earliest estimates of CM-Cities for the year 2019 is compared to the latest Vulcan for the year 2015, multiple factors that govern emissions could have changed during the period, 2. the different accounting methods, as CM-Cities uses a territorial downscaling approach, while Vulcan uses a consumption-based bottom-up accounting approach, and 3. the differences in sector coverage definitions and source data (Tables 1, 2, 5), which partially explains why the total emission comparison show a better good agreement than the sectoral comparisons.
Direct comparisons with CDP-ICLEI Track were difficult due to several reasons: 1. CDP-ICLEI Track inventories are city self-reported data, which were typically estimated using different methods. 2. Most cities follow the GPC protocol and report in scopes rather than in sectors, therefore, we only compared total emissions. 3. CDP-ICLEI Track has not independently calculated the uncertainty range for these self-reported inventories, and self-reported uncertainties are expected to be variable. For example, 45% of cities reported "high confidence" in their emissions data for 2021, 35% reported "medium confidence", and 3% reported "low confidence". 4. Spatial coverage is unclear, as the definition of a "city" can vary across different countries, some cities report based on administrative areas, but others include adjacent areas, but no shapefiles or raster maps were provided to clarify the exact city boundary or area of accounting. Nonetheless, we performed comparisons for 24 large cities in different regions with available CDP-ICLEI Track inventories, the total emission comparisons for these cities show an agreement with R 2 = 0.74 and Rd = 31% (Fig. 9, Table 5).
Uncertainty analysis. The uncertainties in this dataset have two sources: 1. The uncertainties inherited from Carbon Monitor and GRACED. 2. The uncertainty introduced by the spatial downscaling process. The uncertainty analysis was conducted based on the 2006 IPCC Guidelines for National Greenhouse Gas Inventories. For the power sector, uncertainty mainly comes from the emission factor and the variability of energy mix for power generation, the 1-sigma uncertainty of power emission from fossil fuel is estimated as ±10.0%. For the industry sector, monthly production data is the main source of uncertainty, especially the production in China, which accounts for more than 60% of world total industrial CO 2 emissions. Monte Carlo simulations were used to determine the confidence interval based on regression models between estimated monthly emissions and officially reported emissions. The 1-sigma uncertainty for the industry sector is estimated as ±36.0%. For the ground transport, uncertainty is estimated by applying the regression between the TomTom congestion index and traffic flux to other cities (other than Paris). The 1-sigma uncertainty for the ground transport sector is estimated as ±9.3%. For the residential sector, the uncertainty is calculated based on comparisons between estimated emissions and consumption-based accounting results for several countries in Europe. The 1-sigma uncertainty for the residential sector is estimated as ±40.0%. For the aviation sector, 1-sigma uncertainty is estimated as ±10.2%. These uncertainty estimates follow the methods used by Carbon Monitor 10 .
Spatial downscaling introduces additional uncertainty because of the rasterization of city areas. Spatial computations are based on raster (gridded) files, but most cities and urban areas have irregular-shaped boundaries that are not fully overlapped with gridded cells. Area discrepancies are found along all city boundaries, and smaller cities typically suffer from higher levels of dissimilarities because few grid cells account for a large portion of the total urban area. We computed the area discrepancies for all cities in the dataset (Fig. 10), and found that 44.53% of cities show an area difference of 0%-5% and the count decreases as the discrepancy percentage gets higher. The mean area discrepancy for all cities is 13.55%. We then estimated the overall uncertainty by first Fig. 10 The frequency distribution of city area (boundary) uncertainty ranges in the dataset. The mean area uncertainty for all cities is 13.55%.
www.nature.com/scientificdata www.nature.com/scientificdata/ applying the error propagation equation provided by IPCC 51 , and then combining the uncertainties of all sectors and the city area uncertainty: where U s and a s are the percentage and quantity (daily mean emissions) of the uncertainty for sector s, respectively, and U a is the city area uncertainty. Finally, the overall annual uncertainty range of CM-Cities is estimated as ±21.66%.
Uncertainties at a daily scale are also estimated for each sector (Table 6). For example, the daily uncertainty from the power sector is estimated by comparing our results with emissions from real load curve data in several cities. Daily uncertainty for the ground transportation sector has two components, the regression model and the daily allocation of CO 2 emissions by traffic flow. The uncertainty quantification of the daily-scale allocation of emissions requires real daily emissions from ground transportation, which is difficult to obtain and ignored in this study. Therefore, we focus on the uncertainty generated by the regression model. We used the 95% confidence interval of the regression model to estimate the uncertainty generated by the model. Given the high temporal resolution of this dataset, the uncertainties from daily activity will increase the overall uncertainties on top of the annual uncertainties, which is not inconsistent, as data are temporally auto-correlated.

Usage Notes
The generated datasets 50 are available from https://doi.org/10.6084/m9.figshare.19425665.v1. The main data file has more than one million lines of data, which will take a long time to load in Excel. We recommend loading the data with a script that can handle large datasets. We have provided an example of Python code to help users read in and plot emissions for any city in the dataset (https://github.com/dh107/Carbon-Monitor-Cities/). Note that the raw TomTom and flightradar24 data are not included in this dataset as we only provide the estimated emissions. Users should also note that the unit of emissions in this dataset is ktCO 2 . Filename indicates whether the data is based on administrative areas (GADM) or functional urban areas (default). The next update to this dataset is scheduled for May 2022, which will update the dataset to Feb 28, 2022.

Code availability
Python code for producing, reading and plotting data for any city in the dataset is provided at https://github.com/ dh107/Carbon-Monitor-Cities/.